Epigenetic OCT4 regulatory network: stochastic analysis of cellular reprogramming

Experimental studies have shown that chromatin modifiers have a critical effect on cellular reprogramming, i.e., the conversion of differentiated cells to pluripotent stem cells. Here, we develop a model of the OCT4 gene regulatory network that includes genes expressing chromatin modifiers TET1 and JMJD2, and the chromatin modification circuit on which these modifiers act. We employ this model to compare three reprogramming approaches that have been considered in the literature with respect to reprogramming efficiency and latency variability. These approaches are overexpression of OCT4 alone, overexpression of OCT4 with TET1, and overexpression of OCT4 with JMJD2. Our results show more efficient and less variable reprogramming when also JMJD2 and TET1 are overexpressed, consistent with previous experimental data. Nevertheless, TET1 overexpression can lead to more efficient reprogramming compared to JMJD2 overexpression. This is the case when the recruitment of DNA methylation by H3K9me3 is weak and the methyl-CpG-binding domain (MBD) proteins are sufficiently scarce such that they do not hamper TET1 binding to methylated DNA. The model that we developed provides a mechanistic understanding of existing experimental results and is also a tool for designing optimized reprogramming approaches that combine overexpression of cell-fate specific transcription factors (TFs) with targeted recruitment of epigenetic modifiers.

Ω D tot .Considering the assumption that there is approximately one nucleosome for every 200 base pairs ( [1], Chapter 4), and taking into account that an average gene spans 10 × 10 3 − 20 × 10 3 base pairs [2], the average value of D tot can be considered to range between 50 and 100.Specifically, in all simulations, we assume D tot to be 50.Furthermore, in our simulations we also assume k A M /Ω = 0.008 h −1 .Then, an interval of time of t = 21 days = 504 h corresponds to τ = 504 • 0.008 • 50 = 201.6.

Reprogramming based on different levels of OCT4 overexpression
Here, we conducted several simulations using SSA to analyze the trajectory of the active chromatin state of the OCT4 gene, n A O , starting from a fully repressed state (n A O = n A T = n A J = 0), for different levels of OCT4 overexpression, that is, ūO > 0 (Figure 2).
The results of this analysis are shown in Supplementary Figure 2.For low values of ū0 , none of the trajectories reach the active state in the time span observed, corresponding to 21 days.Then, by increasing ūO some trajectories reach the OCT4 gene active state, but the latency variability does not seem to be reduced by increasing the overexpression level of OCT4 (the input ūO ) among the trajectories.
These results indicates that the higher the OCT4 level is, the more efficient the OCT4 reactivation process is. 1 3 Reprogramming approaches based on constant OCT4 overexpression and transient overexpression of TET1 and/or JMJD2 Here, for each epigenetic modifier TET1 and JMJD2, we compared the efficiency %O A of the two reprogramming approaches, in which, together with OCT4 constant overexpression, the epigenetic modifier is either constantly or transiently overexpressed.Let us start from the case in which TET1 is overexpressed and let us define the initial level of TET1 overexpresssion as ū0 T .Then, we modeled a constant overexpression of TET1 as ūT = ū0 T , as done in the "Results" section, while we modeled a transient TET1 overexpression as ūT = ū0 T e −ε d τ , where ε d is the normalized dilution rate constant.
Similarly, defining the initial level of JMJD2 overexpresssion as ū0 J , we modeled a constant overexpression of JMJD2 as ūJ = ū0 J , while we modeled a transient JMJD2 overexpression as ūJ = ū0 J e −ε d τ .The results of this computational analysis show that the transient overexpression of TET1 or JMJD2, although less effective than constant overexpression, enhances the efficiency of the OCT4 reactivation process (Supplementary Figure 7 and Supplementary Figure 9).Additionally, the effectiveness of transient overexpression of epigenetic modifiers becomes more comparable to constant overexpression when the initial level of overexpression is higher (Supplementary Figure 8 and Supplementary Figure 10).
We then analyzed the effect of sequential, transient overexpression of TET1 and JMJD2.Specifically, we studied the sequential process in which we first introduce TET1 and then JMJD2.This is because, in practical scenarios, we are likely to encounter a parameter regime where the enhancement of DNA methylation establishment by H3K9me3 is not highly effective, that is low r (see "Results" section).Since r is low, adding JMJD2 to erase repressive histone modifications, and, with them, the enhancement of DNA methylation establishment, is not effective unless DNA methylation is quickly erased.A fast erasure of DNA methylation can be achieved by adding TET1.Our computational analysis reveals that when the initial level of transient overexpression for both TET1 and JMJD2 is sufficiently high and matches the level of constant overexpression for either TET1 or JMJD2, the sequential transient overexpression of TET1 followed by JMJD2 can be nearly as effective as constant TET1 overexpression and more effective than JMJD2 constant overexpression (Supplementary Figure 11, right-hand side plot).Moreover, when comparing the TET1-JMJD2 sequential transient overexpression with an initial overexpression level of both JMJD2 and TET1 that is higher compared to the initial level of constant TET1 or JMJD2 overexpression (Supplementary Figure 12), TET1 -JMJD2 sequential, transient overexpression can show higher efficiency compared to both constant TET1 overexpression and constant JMJD2 overexpression.This is practically relevant because a high overexpression level that is not tolerated by the cells for extended amount of time, as in constitutive overexpression, may be well tolerated if lasting only a short amount of time.

Computational analysis by introducing in the model the binomial partitioning of molecules at the end of the cell cycle
In our current model, we describe the effect of dilution due to cell growth and division as an effective decay reaction with first-order kinetics, which is one of the standard models used [3].However, different models may be implemented to more accurately incorporate the cell cycle into the dilution process.Here, we develop a more elaborated model in which, instead of introducing the dilution of a generic species X as a 1 st order reaction X → ∅, with rate constant δ > 0, we explicitly introduce the growth rate and division of cells over time and then the binomial partitioning of molecules at the end of the cell cycle.Then, we study the new model by using the Adiabatic Time-Dependent Gillespie (ATG) approach [4].Compared to the standard Gillespie algorithm used for our original model, in the ATG the volume is not constant and the propensity functions that depend on the volume become a function of time.More precisely, we assume that the volume grows exponentially with time until it doubles and divides.The key steps of the ATC algorithm can be summarized as follows: let us assume that, within the volume Ω(t), X 1 , ..., X n species interact through reactions R 1 , ...R m .Furthermore, let us introduce the state vector x = [x 1 , x 2 , ..., x j , ..., x n ], in which x j corresponds to the molecular count of species X j , j = 1, ..., n.Then, the propensity function of reaction R i , i = 1, ..., m, that is, the probability that reaction R i occurs in the time interval (t, t + dτ ], can be written as a i (x, t)dτ , and the change in the molecular count associated to R i can be written as ∆ i .Now, Step A) Set t = 0 and initial values for x 1 , ..., x n .
Step B) Set the initial volume Ω(t) = Ω(0), the growth rate δ > 0 and the doubling time Step C) Initialize the number of cell division, c, to 0, i.e., c ← 0.
Step F) Compute τ , that is, the time interval until the next reaction, by using the formula .
Step G) Find the next reaction R i that will occur by taking i to be the positive integer such Step H) For the simulations, we considered all the reactions listed in Figure 1b of the paper, except for the 1st order reactions modeling dilution, which we removed.As for the reactions with a rate constant δ ′ > 0, which represent the effective passive erasure of DNA methylation resulting from the balance between dilution and the maintenance process through DNMT1, we also removed them.Instead, we introduced two additional reactions (D → D R 1 and D R 2 → D R 12 ) with a rate constant of k DN M T 1 M > 0 to keep in our model the DNA methylation maintenance process via DNMT1.Our computational study shows that, while the time trajectories obtained with the new model are less smooth compared to the ones obtained with our original model, the trend with which the overexpression of TET1 and JMJD2 affect the OCT4 reprogramming efficiency and latency variability is not significantly affected (Supplementary Figure 14 and Supplementary Figure 15).O (total amount of nucleosomes modified with activating chromatin modifications for the OCT4 gene) starting from the OCT4 fully active state for different values of μ.In all plots, on the x axis we have the time (days).The parameter values used for these simulations can be found in Supplementary table 2. In particular, we set μ = 1, 0.15, 0.07, ε d = 0.3, pO = pT = pJ = 3.2, η = 0.1, μ′ = 1, ε e = 0.3 and ε ′ = 1.In our model, parameter μ quantifies the asymmetry between the erasure rates of repressive histone modifications and activating histone modifications.Mathematical definition of μ can be found in Eq. ( 2).For all simulations, we implemented the reactions listed in Figure 1 with the SSA [5] and we considered a time span of 21 days (τ = 201.6)and D tot = 50 (see Supplementary Note 1).
The parameter values used for these simulations can be found in Supplementary table 2. In particular, we consider three values of ūO (i.e., ūO = 160, 220, 320), and we set ūT = 0, ūJ = 0, μ′ = 0.5, ε d = 0.2, ε e = 0.2, pO = pT = pJ = 5, η = 0.1, μ = 1 and ε ′ = 1 (See Supplementary Figure 13 for how the parameters μ′ and p influence the impact of ūO on the reprogramming process).For all simulations, we implemented the reactions listed in Figure 1     T .In all plots, on the x axis we have the time (days).The parameter values used for these simulations can be found in Supplementary table 6.In particular, we consider two values of ū0 T (ū 0 T = 160, 320), three values of μ′ (μ ′ = 1, 0.5, 0.2), three values of ūO (ū O = 160, 320, 480), and we set ε = 0.3, in which ε = ε d = ε e , p = 2.5, in which p = pO = pT = pJ , η = 0.1, μ = 1 and ε ′ = 1.In all the plots, we represented %O A for the case in which only OCT4 is overexpressed (yellow curve).For all simulations, we implemented the reactions listed in Figure 1  T .In all plots, on the x axis we have the time (days).The parameter values used for these simulations can be found in Supplementary table 6.In particular, we consider two values of ū0 J (ū 0 J = 160, 480), three values of μ′ (μ ′ = 1, 0.5, 0.2), three values of ūO (ū O = 160, 320, 480), and we set ε = 0.3, in which ε = ε d = ε e , p = 2.5, in which p = pO = pT = pJ , η = 0.1, μ = 1 and ε ′ = 1.In all the plots, we represented %O A for the case in which only OCT4 is overexpressed (yellow curve).For all simulations, we implemented the reactions listed in Figure 1  O ≥ 40, starting from n A O = 0, and fL , that is, the normalized frequency of a specific latency value across all the N = 100 simulations.In all plots, on the x axis we have the time (days).The parameter values used for these simulations can be found in Supplementary table 8.In particular, we consider three values of ūT (i.e., ūT = 0, 50, 160), and we set ūO = 320, ūJ = 0, μ′ = 0.5, ε d = 0.2, ε e = 0.2, pO = pT = pJ = 5, μ = 1 and ε ′ = 1.For all simulations, we implemented the reaction systems described in Supplementary Note 4 with the ATG alghoritm [4] and we considered a time span of 21 days (τ = 201.6)and D tot = 50 (see Supplementary Note 1).In our model, D tot represents the total number of nucleosomes within a gene of interest.Assuming about one nucleosome per 200 bp [1] and assuming that an average gene spans between 10,000 and 20,000 bp [2], then the value of D tot can be considered, on average, between 50 and 100.In our computational study, we set D tot = 50.
In our model, we describe the effect of dilution due to cell growth and division as an effective decay reaction with first-order kinetics, which is one of the standard models used [3].The assumption supporting this model is that cells grow exponentially in size [2], with growth rate constant δ > 0. This means that species' concentration decays exponentially through a first-order reaction, with half-life that can be written as t hl = ln(2)/δ.Furthermore, if we also assume that, on average, the species concentration is evenly divided at cell division, then the half-life is equal to the cell cycle length T (i.e., t hl = T ), where T is the cell cycle length.The decay rate constant can then be written as δ = ln(2)/T .Given that the mammalian cell cycle length is 16 -24 hr [2,6], we consider δ = 0.04 (corresponding to T = 17.33 h) as an intermediate value for δ.
Furthermore, we set η = 0.1, and then δ ′ = 0.1δ, based on experimental data showing the high efficiency of the DNA methylation maintenance process through DNMT1 [7].
In [8], Hanna et al. show that the latency of silenced gene reactivation exhibits significant variability.Based on the original study conducted to characterize the chromatin modification circuit used here [9], it is observed that the reactivation process exhibits high latency variability within a parameter regime where ε d is small.We then set k A M /Ω = 0.008 h −1 , so that, in the range of δ values considered, ε d = (δ)/( k A M Ω D tot ) spans between 0.06 and 0.3.Concerning the rate of the recruited erasure process k A E /Ω, it is well known that the rates of enzymatic reactions characterized by highly specific enzyme-substrate binding tend to be higher than those of enzymatic reactions involving non-specific enzyme-substrate binding and removal through dilution resulting from cell growth [10].Furthermore, previous studies show that varying the parameter ε ′ does not significantly affect the trends with which the other parameters affect the stochastic behavior of the chromatin modification circuit considered in this paper, unless ε ′ is much smaller than ε d and ε e [9].Thus, we set , that is, ε ′ is always larger than the values of ε e and ε d considered in our simulations.
Finally, recent experimental data show that the DNA methylation erasure process is significantly slower compared to histone modification erasure [11].Thus, we set values for the erasure rates of DNA methylation always equal or lower compared to the erasure rates of repressive histone modifications, that is, setting k′ * T and kR 2 such that μ′ /μ = k′ * T / kR 2 ≤ 1. Regarding the other parameters, multiple parameter regimes have been considered in order to better understand how different parameters affect the stochastic properties of the system.

1 :
H3K9me3 erasure rate does not have a relevant impact on the dynamics of OCT4 during differentiation.Time trajectories of n A

Supplementary Figure 2 :
Higher levels of OCT4 overexpression increase the efficiency of the reprogramming process.Upper plots: time trajectories of n A O (total amount of nucleosomes modified with activating chromatin modifications for the OCT4 gene) starting from the OCT4 fully repressed state (n A O = 0) for different values of ūO .Intermediate plots: %O A , that is, the normalized amount of N = 100 time trajectories which reach n A O ≥ 40, starting from n A O = 0. Lower plots: histogram plot showing the probability of a specific latency value across all the N = 100 simulations.In all plots, on the x axis we have the time normalized with respect to k

3 :
with the SSA [5] and we considered a time span of 21 days (τ = 201.6)and D tot = 50 (see Supplementary Note 1Effect of the parameter ε d on efficiency and latency variability of reprogramming through overexpression of OCT4 for different values of ūO , ε e , μ′ , and p. %O A , that is, the normalized amount of N = 100 time trajectories which reach n A O ≥ 40, starting from n A O = 0, for different values of ε d .In all plots, on the x axis we have the time (days).The parameter values used for these simulations can be found in Supplementary table 3.In particular, we consider three values of ε d (ε d = 0.3, 0.1, 0.06), three values of μ′ (μ ′ = 1, 0.5, 0.2), three values of ε e (ε e = 0.3, 0.1, 0.06), three values of p (p = 1, 2.5, 5), in which p = pO = pT = pJ , three values of ūO (ū O = 160, 320, 480), and we set η = 0.1, μ = 1 and ε ′ = 1.Definitions of the parameters ūO , ε d , ε e , μ′ , pO , pJ , and pT is given in the "Model of the epigenetic OCT4 gene regulatory network" subsection.For all simulations, we implemented the reactions listed in Figure 1 with the SSA [5], we considered a time span of 21 days (τ = 201.6),and D tot = 50 (see Supplementary Note 1).

6 : 7 : 8 :
Comparison of the effect of JMJD2 overexpression (ū J ) and TET1 overexpression (ū T ) on efficiency and latency variability of reprogramming through overexpression of OCT4 for different values of r and µ ′ .%O A , that is, the normalized amount of N = 100 time trajectories which reach n A O ≥ 40, starting from n A O = 0, for different values of r and µ ′ .In all plots, on the x axis we have the time (days).The parameter values used for these simulations can be found in Supplementary table 4. In particular, we consider three values of r (r = 5, 1, 0.2), three values of μ′ (μ ′ = 1, 0.5, 0.1), three values of ū0 (ū 0 = 100, 160, 320), in which either ūT = ū0 (blue lines) or ūJ = ū0 (red lines), three values of ε (ε = 0.3, 0.1, 0.06), in which ε = ε d = ε e , and we set ūO = 160, p = 5, η = 0.1, μ = 1 and ε ′ = 1.Definitions of the parameters ūO , ūT , ūJ , r and μ′ is given in the "Model of the epigenetic OCT4 gene regulatory network" subsection.For all simulations, we implemented the reactions listed in Figure 1 with the SSA [5], we considered a time span of 21 days (τ = 201.6),and D tot = 50 (see Supplementary Note 1).Comparison of the effect of constant and transient TET1 overexpression on efficiency and latency variability of reprogramming through overexpression of OCT4 for different values of ūO , ε d , ε e , μ′ , and p. %O A , that is, the normalized amount of N = 100 time trajectories which reach n A O ≥ 40, starting from n A O = 0, for different parameter values.In all plots, on the x axis we have the time (days).The parameter values used for these simulations can be found in Supplementary table 5.In particular, we consider three values of μ′ (μ ′ = 1, 0.5, 0.2), three values of ε (ε = 0.3, 0.1, 0.06), in which ε = ε d = ε e , three values of p (p = 1, 2.5, 5), in which p = pO = pT = pJ , three values of ūO (ū O = 160, 320, 480), and we set ū0 T = 160, η = 0.1, μ = 1 and ε ′ = 1.In all the plots, we represented %O A for the case in which only OCT4 is overexpressed (yellow curve).Definitions of the parameters ūO , ūT , ε d , ε e , μ′ , pO , pJ , and pT is given in the "Model of the epigenetic OCT4 gene regulatory network" subsection.For all simulations, we implemented the reactions listed in Figure1with the SSA [5], we considered a time span of 21 days (τ = 201.6),and D tot = 50 (see Supplementary Note 1).The effectiveness of transient overexpression of TET1 becomes more comparable to constant overexpression when the initial level of overexpression is higher.%O A , that is, the normalized amount of N = 100 time trajectories which reach n A O ≥ 40, starting from n A O = 0, for different values of ū0

10 :
with the SSA [5], we considered a time span of 21 days (τ = 201.6),and D tot = 50 (see Supplementary Note 1).The effectiveness of transient overexpression of JMJD2 becomes more comparable to constant overexpression when the initial level of overexpression is higher.%O A , that is, the normalized amount of N = 100 time trajectories which reach n A O ≥ 40, starting from n A O = 0, for different values of ū0

13 :Supplementary
with the SSA [5], we considered a time span of 21 days (τ = 201.6),and D tot = 50 (see Supplementary Note 1).How the level of OCT4 overexpression (ū O ) affects the reprogramming process for different values of μ′ and p. %O A , that is, the normalized amount of N = 100 time trajectories which reach n A O ≥ 40, starting from n A O = 0, for different values of ūO .In all plots, on the x axis we have the time (days).The parameter values used for these simulations can be found in Supplementary table 7.In particular, we consider three values of ūO (ū O = 160, 320, 480), three values of μ′ (μ ′ = 1, 0.5, 0.2), two values of p (p = 2.5, 5), in which p = pO = pT = pJ , and we set ε = 0.3, in which ε = ε d = ε e , η = 0.1, μ = 1 and ε ′ = 1.For all simulations, we implemented the reactions listed in Figure 1 with the SSA [5], we considered a time span of 21 days (τ = 201.6),and D tot = 50 (see Supplementary Note 1).Figure 14: Effect of TET1 overexpression (ū T ) on the reprogramming process through overexpression of OCT4 by simulating the model including the binomial partitioning of molecules at the end of the cell cycle (Supplementary Note 4).Top plots: time trajectories of n A O (total amount of nucleosomes modified with activating chromatin modifications for the OCT4 gene) starting from the OCT4 fully repressed state (n A O = 0) for different values of ūT .Bottom plots: %O A , that is, the normalized amount of N = 100 time trajectories which reach n A

6
Parameter values and motivation of parameter choices used to generate the plots in the main paper and SI

table 1 :
Parameter values associated with the plots in Figures3 -5.

table 3 :
Parameter values associated with the plots in Supplementary Figure3, Supplementary Figure4, and Supplementary Figure5.

table 6 :
Parameter values associated with the plots in Supplementary Figure8, Supplementary Figure10, and Supplementary Figure12.

table 7 :
Parameter values associated with the plots in Supplementary Figure13.

table 8 :
Parameter values associated with the plots in Supplementary Figure14and Supplementary Figure15.